Facing data scarcity using variable feature vector dimension
نویسندگان
چکیده
This paper focuses on three key points of intonation modelling: interpolation of fundamental frequency contour, sentence by sentence parameter extraction and data scarcity. In some cases, they introduce noise and inconsistency on training data reducing the performance of machine learning techniques. We consider that the F0 contour is segmented into prosodic units (such as accent groups, minor phrases, etc). Each segment of F0 contour has a corresponding feature vector with linguistic and non-linguistic components. We propose to face the limitations mentioned above using a technique based on clustering using different feature vector dimensions. The clustering of feature vectors produces also a partition in the F0 contour space. The proposal consists on a procedure to select the dimension that contributes to predict the best fundamental frequency contour from a RMSE sense compared to a reference contour. Experimental results show an improvement compared to other approaches.
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملVariable Selection as an Instance-Based Ontology Mapping Strategy
The paper presents a novel instance-based approach to aligning concepts taken from two heterogeneous ontologies populated with text documents. We introduce a concept similarity measure based on the size of the intersection of the sets of variables which are most important for the class separation of the instances in both input ontologies. We suggest a VC dimension variable selection criterion e...
متن کاملVariable Dimension Vector Quantization of Speech Spectra for Low Rate Vocoders
Optimal vector quantization of variable-dimension vectors in principle is feasible by using a set of fixed dimension VQ codebooks. However, for typical applications, such a multi-codebook approach demands a grossly excessive and impractical storage and computational complexity. Efficient quantization of such variable-dimension spectral shape vectors is the most challenging and difficult encodin...
متن کاملApplying Genetic Algorithm to EEG Signals for Feature Reduction in Mental Task Classification
Brain-Computer interface systems are a new mode of communication which provides a new path between brain and its surrounding by processing EEG signals measured in different mental states. Therefore, choosing suitable features is demanded for a good BCI communication. In this regard, one of the points to be considered is feature vector dimensionality. We present a method of feature reduction us...
متن کاملF0 feature extraction by polynomial regression function for monosyllabic Thai tone recognition
This paper presents a monosyllabic Thai tone recognition system. The system is composed of three main processes, fundamental frequency (F0) extraction from input speech signal, analysis of F0 contour for feature extraction, and classification of each tone using the extracted features. In the F0 feature extraction, the polynomial regression functions are employed to fit the segmented F0 curve wh...
متن کامل